-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] /streaming_chat endpoint PoC #1527
base: main
Are you sure you want to change the base?
[DRAFT] /streaming_chat endpoint PoC #1527
Conversation
dae29f3
to
91c8daa
Compare
e833b89
to
d251b56
Compare
dc0fd2c
to
23483b5
Compare
2d9bdbf
to
eed802b
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a peek @TamiTakamiya
I don't think we need the async_invoke
function as you've now moved streaming support to a new type of pipeline (ModelPipelineStreamingChatBot
).
"inference_url": chatbot_service_url or "http://localhost:8000", | ||
"model_id": chatbot_service_model_id or "granite3-8b", | ||
"verify_ssl": model_service_verify_ssl, | ||
"stream": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
|
||
|
||
StreamingChatBotResponse = Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this should be StreamingChatBotResponse = AsyncGenerator
@@ -274,6 +301,9 @@ def alias() -> str: | |||
def invoke(self, params: PIPELINE_PARAMETERS) -> PIPELINE_RETURN: | |||
raise NotImplementedError | |||
|
|||
def async_invoke(self, params: PIPELINE_PARAMETERS) -> AsyncGenerator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method isn't needed (yes, yes, I know I proposed it.. but I found a better way.. I think)
def __init__(self, config: HttpConfiguration): | ||
super().__init__(config=config) | ||
|
||
def invoke(self, params: StreamingChatBotParameters) -> StreamingHttpResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change L#222-L#225 to simply be:
async def invoke(self, params: StreamingChatBotParameters) -> StreamingChatBotResponse:
i.e. there is no need for an invoke
AND async_invoke
function; just an invoke
function.
}, | ||
summary="Streaming chat request", | ||
) | ||
def post(self, request) -> Response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the return type be StreamingHttpResponse
?
self.event.modelName = self.req_model_id or self.llm.config.model_id | ||
|
||
return StreamingHttpResponse( | ||
self.llm.async_invoke( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my other changes are integrated; this can simply self.llm.invoke(....)
@@ -159,7 +159,7 @@ def assert_basic_data( | |||
self.assert_common_data(data, expected_status, deployed_region) | |||
timestamp = data["timestamp"] | |||
dependencies = data.get("dependencies", []) | |||
self.assertEqual(10, len(dependencies)) | |||
self.assertEqual(11, len(dependencies)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JFYI, there's also a test in ansible-wisdom-testing
that will break.
Jira Issue: https://issues.redhat.com/browse/AAP-39044
Description
A PoC for
/streaming_chat
endpointTesting
Steps to test
Scenarios tested
Production deployment